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We exhibit an orthogonal set of product states of two three-state particles that nevertheless cannot 
qq". be reliably distinguished by a pair of separated observers ignorant of which of the states has been 

' presented to them, even if the observers are allowed any sequence of local operations and classical 

. communication between the separate observers. It is proved that there is a finite gap between the 

t-H ■ mutual information obtainable by a joint measurement on these states and a measurement in which 

only local actions are permitted. This result implies the existence of separable superoperators that 
cannot be implemented locally. A set of states are found involving three two-state particles which 
also appear to be nonmeasurable locally. These and other multipartite states are classified according 
to the entropy and entanglement costs of preparing and measuring them by local operations. 
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*0 ■ I. INTRODUCTION 

o : 

The most celebrated manifestations of quantum nonlocality arise from entangled states — states of a compound 
OO 1 quantum system that admit no description in terms of states of the constituent parts. Entangled states, by their 
0^ ' experimentally confirmed violations of Bell-type inequalities, provide strong evidence for the validity of quantum 
mechanics, and they can be used for novel forms of information processing, such as quantum cryptography [1], 
entanglement-assisted communication [2,3], and quantum teleportation [4], and for fast quantum computations [5,6], 
which pass through entangled states on their way from a classical input to a classical output. A related feature of 
c~j ■ quantum mechanics, also giving rise to nonclassical behavior, is the impossibility of cloning [7] or reliably distinguishing 
d \ nonorthogonal states. Quantum systems that for one reason or another behave classically (e.g., because they are of 
macroscopic size or are coupled to a decohering environment) can generally be described in terms of a set of orthogonal, 
uncntangled states. 

^ . In view of this, one might expect that if the states of a quantum system were limited to a set of orthogonal 
product states, the system would behave entirely classically, and would not exhibit any nonlocality. In particular, if 
a compound quantum system, consisting of two parts A and B held by separated observers (Alice and Bob), were 
prepared by another party in one of several mutually orthogonal, unentangled states, ipi,ip2—ipn unknown to Alice 
and Bob, then it ought to be possible to reliably discover which state the system was in by locally measuring the 
separate parts. Also, it ought to be possible to clone the state of the whole by separately duplicating the state of each 
part. We show that this is not the case, by exhibiting sets of orthogonal, unentangled states {ipi} of two-party and 
three-party systems such that 

• the states {ipi} can be reliably distinguished by a joint measurement on the entire system, but not by any 
sequence of local measurements on the parts, even with the help of classical communication between the observers 
holding the separate parts; 

the cloning operation ipi — > ipi ® ipi cannot be implemented by any sequence of local operations and classical 
communication. 



Some of the features of this new kind of nonlocality appeared first in [8] , which presented a set of orthogonal states 
of a bipartite system that cannot be cloned if Alice and Bob cannot communicate at all. However, the states in [8] 
can be cloned if Alice and Bob use one-way classical communication. 

Many more of the nonlocal properties considered in the present work were anticipated by the measurement protocol 
introduced by Peres and Wootters [9]. Their construction indicates the existence of a nonlocality dual to that 
manifested by entangled systems: entangled states must be prepared jointly, but exhibit anomalous correlations 
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when measured separately; the Peres- Wootters states are unentangled, and can be prepared separately, but exhibit 
anomalous properties when measured jointly. We note that such anomalies arc at the heart of recent constructions 
for attaining the highest possible capacity of a quantum channel for the transmission of classical data [10-13]. 

In the Peres- Wootters scheme, the preparator chooses one of three linear polarization directions 0, 60, or 120 degrees, 
and gives Alice and Bob each one photon polarized in that direction. Their task is to determine which of the three 
polarizations they have been given by a sequence of separate measurements on the two photons, assisted by classical 
communication between them, but they are not allowed to perform joint measurements, nor to share entanglement, 
nor to exchange quantum information. 

Of course, because the three two-photon states are nonorthogonal, they cannot be cloned or reliably distinguished, 
even by a joint measurement. However, Peres and Wootters performed numerical calculations which provided evidence 
(more evidence on an analogous problem was provided by the work of Massar and Popescu [14]) indicating that a 
single joint measurement on both particles yielded more information about the states than any sequence of local 
measurements. Thus unentangled nonorthogonal states appear to exhibit a kind of quantitative nonlocality in their 
degree of distinguishibility. The discovery of quantum teleportation, incidentally, grew out of an attempt to identify 
what other resource, besides actually being in the same place, would enable Alice and Bob to make an optimal 
measurement of the Peres- Wootters states. 

Another antecedent of the present work is a series of papers [15-17] resulting in the conclusion [17] that several 
forms of quantum key distribution [18] can be viewed as involving orthogonal states of a serially-presented bipartite 
system. These states cannot be reliably distinguished by an eavesdropper because she must let go of the first half of 
the system before she receives the second half. In this example, the serial time-ordering is essential: if, for example, 
the two parts were placed in the hands of two separate classically-communicating eavesdroppers, rather than being 
serially presented to one eavesdropper, the eavesdroppers could easily cooperate to identify the state and break the 
cryptosystem. 

In this paper we report a form of nonlocality qualitatively stronger than either of these antecedents. We extensively 
analyze an example in which Alice and Bob are each given a three-state particle, and their goal is to distinguish 
which of nine product states, ipi = \au) ® \(3i),i = 1 . . .9 the composite 3x3 quantum system was prepared in. 
Unlike the Peres- Wootters example, these states are orthogonal, so the joint state could be identified with perfect 
reliability by a collective measurement on both particles. However, the nine states are not orthogonal as seen by 
Alice or Bob alone, and we prove that they cannot be reliably distinguished by any sequence of local measurements, 
even permitting an arbitrary amount of classical communication between Alice and Bob. We call such a set of states 
"locally immeasurable" and give other examples, e.g., a set of two mixed states of two two-state particles (qubits), 
and sets of four or eight pure states of three qubits, which apparently cannot be reliably distinguished by any local 
procedure despite being orthogonal and unentangled. 

In what sense is a locally immeasurable set of states "nonlocal?" Surely not in the usual sense of exhibiting 
phenomena inexplicable by any local hidden variable (LHV) model. Because the ipi are all product states, it suffices 
to take the local states a, and on Alice's and Bob's side respectively, as the local hidden variables. The standard 
laws of quantum mechanics (e.g. Malus' law), applied separately to Alice's and Bob's subsystems, can then explain 
any local measurement statistics that may be observed. However, an essential feature of classical mechanics, not 
usually mentioned in LHV discussions, is the fact that variables corresponding to real physical properties are not 
hidden, but in principle measurable. In other words, classical mechanical systems admit a description in terms of 
local unhidden variables. The locally immeasurable sets of quantum states we describe here are nonlocal in the 
sense that, if we believe quantum mechanics, there is no local unhidden variable model of their behavior. Thus a 
measurement of the whole can reveal more information about the system's state than any sequence of classically 
coordinated measurements of the parts. 

The inverse of local measurement is local preparation, the mapping from a classically-provided index i to the 
designated state ipi, by local operations and classical communication. If the states ipi are unentangled, local preparation 
is always possible, but for any locally-immeasurable set of states this preparation process is necessarily irreversible 
in the thermodynamic sense, i.e., possible only when accompanied by a flow of entropy into the environment. Of 
course if quantum communication or global operations were allowed during preparation, the preparation could be 
done reversibly, provided that the states being prepared are orthogonal. 

By eliminating certain states from a locally-immeasurable set (such as {ipi, ...ipg} in Eq. (3) below), we obtain what 
appears to be a weaker kind of nonlocality, in which the remaining subset of states is both locally preparable and 
locally measurable, but in neither case (so far as we have been able to discover) by a thermodynamically reversible 
process. Curiously, in these situations, the entropy of preparation (by the best protocols we have been able to find) 
exceeds the entropy of measurement. 

Besides entropies of preparation and measurement we have explored other quantitative measures of nonlocality for 
unentangled states. One obvious measure is the amount of quantum communication that would be needed to render 
an otherwise local measurement process reliable. Another is the mutual information deficit when one attempts to 
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distinguish the states by the best local protocol. Finally one can quantify the amount of advice, from a third party 
who knows i, that would be sufficient to guide Alice and Bob through an otherwise local measurement procedure. 

The results of this paper also have a bearing on, and were directly motivated by, a question which arose recently in 
the context of a different problem in quantum information processing. This is the problem of entanglement purification, 
in which Alice and Bob have a large collection of identical bipartite mixed states that are partially entangled. Their 
object is to perform a sequence of operations locally, i.e., by doing quantum operations on their halves of the states and 
communicating classically, and end up with a smaller number of pure, maximally entangled states. Recently, bounds 
on the efficiency of this process have been studied by Rains [19] and by Vedral and Plenio [20]; other constraints on 
entanglement purification by separable superoperators have recently been studied by Horodecki et al. [21]. 

In their work, they represent the sequence of operations using the theory of superoperators, which can describe any 
combination of unitary operations, interactions with an ancillary quantum system or with the environment, quantum 
measurement, classical communication, and subsequent quantum operations conditioned on measurement results. In 
the operator-sum representation of superoperators developed by Kraus and others, the general final state S(p) of the 
density operator of the system is written as a function of the initial state p as: 

S(p) = ^S k pSl (1) 

k 

The Sk operators appearing in this equation will be referred to as "operation elements." A trace- decreasing super- 
operator satisfies the condition < ^fc^fc^fc < 1 ano - ^ s appropriate for describing the effect of arbitrary quantum 
measurements on the system ( [22], Sec. Ill), while a trace-preserving superoperator specified by J2k^l^ k ~ 1 de- 
scribes a general time evolution of the density operator if a measurement is not made or its outcomes are ignored [23] . 
Reference [24] has a useful general review of the superoperator formalism. 

To impose the constraint that Alice and Bob act only locally, Rains, and Vedral and Plenio, restricted their attention 
to separable superoperators, in which the operation elements have a direct product form involving an Alice operation 
and a Bob operation: 

S k = A k <g> B k . (2) 

We will show in Sec. II B (see also [22], Sec. IX. C) that all operations that Alice and Bob can perform during 
entanglement purification bilocally, in which they can perform local quantum operations and communicate classically, 
can be written in this separable form. This was enough for the derivation of valid upper bounds on the efficiency 
of entanglement purification. But the natural question which this led to is the converse, that is, can all separable 
superoperators be implemented by bilocal operations? 

The answer to this question is definitely no, as a result of the examples which we analyze in this paper. Quantum 
measurements are a subset of the superoperators, and measurements involving only product states are separable su- 
peroperators. Thus, our proof that some uncntanglcd states cannot be distinguished locally shows that some separable 
superoperators cannot be implemented by only separate operations by Alice and Bob with classical communication be- 
tween them. This indicates that any further investigations of entanglement purification protocols involving separable 
superoperators will have to be performed with some caution. 

This paper is organized as follows: Section II presents the 3x3 example and sketches the proof that these states 
cannot be distinguished by local measurements. Appendix B gives many of the important details of this proof, and 
Appendix A supplies a crucial technical detail, that all superoperators can be decomposed into a sequence of very 
weak measurements. Section III shows how the measurement can be done locally if some states are excluded, and 
presents the best measurement strategy we have found for distinguishing (imperfectly) all nine states. Section IV 
shows how the measurement can be done for the 3x3 example if entanglement is supplied. Section V analyzes the 
thermodynamics of local state measurement, studying the heat generated in measurement and in state preparation; 
Appendix C gives some details. Section VI analyzes a three-party 2x2x2 example involving 8 pure states. Section 
VII gives other compact examples (4 pure states in a 2 x 2 x 2 system, 2 mixed states in a 2 x 2 system) and poses 
some questions for the future (Appendix D gives details of a specific problem considered there). 

II. A SEPARABLE MEASUREMENT WHICH IS NOT BILOCAL 

A. The ensemble of states in a 3x3 Hilbert space 

We will consider the following complete, orthonormal set of product states tpi = \cn) ® They live in a nine- 
dimensional Hilbert space, with Alice and Bob each possessing three dimensions. We will use the notation |0), |1), 
and 1 2) for the bases of Alice's and Bob's Hilbert spaces. The orthonormal set is 
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Here |0 ± 1) stands for ^(|0) ± |1)), etc. Figure 1 shows a suggestive graphical way to depict the 9 states of Eq. 
(3) in the 3x3 Hilbert space of Alice and Bob. The four dominoes represent the four pairs of states that involve 
superpositions of the basis states. State tpi is clearly special in that it involves no such superposition. 




FIG. 1. A graphical depiction of the nine orthogonal states of Eq. (3) as a set of dominoes. 



B. The measurement 

We will show that the separable superoperator S(p) = J+ SipSj consisting of the projection operators 

Si = \i)A\i)i>(ipi\ (4) 

cannot be performed by local operations of Alice and Bob, even allowing any amount of classical communication 
between them. In Eq. (4), the output Hilbert space is different from the input; it is a space in which both Alice and 
Bob separately have a complete and identical record of the outcome of the measurement. See Sec. VII for a discussion 
of why we use the particular form of Eq. (4) for the operator; note that the input state need not be present at the 
output in Eq. (4). 

Since this superoperator corresponds to a standard von Neumann measurement, we can equally well consider the 
problem in the form of the following game: Alice and Bob are presented with one of the nine orthonormal product 
states (for the time being, with equal prior probabilities, let us say — this is not important, it is only important that 
the prior probabilities of states ip2 through tpg be nonzero). Their job is to agree on a measurement protocol with 
which they can determine, with vanishingly small error, which of the nine states it is, adhering to a bilocal protocol. 

Let us characterize bilocal protocols a little more explicitly. Our discussion will apply both to bilocal measurements 
and to bilocal supcroperators (in which the measurement outcomes may be traced out). By prior agreement one of 
the parties, let us say Alice, initiates the sequence of operations. The most general operation that she can perform 
locally is specified by the set of operation elements 
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A rl <g> I. 



(5) 



Wc will immediately specialize to the case where each value rl labels a distinct "round 1" measurement outcome 
which she will report to Bob, since no protocol in which she withheld any of this information from Bob could have 
greater power. She cannot act on Bob's state, so her operators are always the identity / on his Hilbcrt space. A r \ 
can also include any unitary operation that Alice may perform before or after the measurement. Note also that the 
operator A r \ may not be a square matrix; the final Hilbcrt space dimension may be smaller (but this would never be 
useful) or larger (because of the introduction of an ancilla) than the original. 
After the record rl is reported to Bob, he does his own operation 

I®B r2 {rl). (6) 

The only change from round 1 is that Bob's operations can be explicit functions of the measurements reported in 
that round. Now, the process is repeated. The overall set of operation elements specifying the net operation after n 
rounds is given by multiplying out a sequence of these operations: 

S m = A m B m , (7) 
A m = A rn (rl,r2, ...,r{n - l))...A r3 {rl,r2)A rl , (8) 
B m = B r(n _i ) (rl,r2, r(n - 2))...B r4 (rl,r2, r3)B r2 (rl). (9) 

Here the label m can be thought of as a concatenation of all the data collected through the n rounds of measurement: 

m = rl : r2 : r3 : ... : rn. (10) 

Equations (7-9) demonstrate the fact that all bilocal operations are also separable operations. It is the converse 
statement that we are about to disprove for the operator corresponding to the nine-state measurement, Eq. (4). 

We can get some intuitive idea of why it will be hard for Alice and Bob to perform Eq. (4) by local operations by 
noting the result if Alice and Bob perform simple, local von Neumann measurements in any of their rounds. These 
measurements can be represented on the "tic-tac-toe" board of Fig. 1 as simple horizontal or vertical subdivisions of 
the board. The fact that any such subdivision cuts apart one of the dominoes shows very graphically that after such 
an operation the distinguishability of the states is spoiled. This spoiling occurs in any local bases, and is more formally 
just a reflection of the fact that the ensemble of states as seen by Alice alone, or by Bob alone, is nonorthogonal. 

However, it is not sufficient to show the impossibility of performing Eq. (4) using a succession of local von Neumann 
measurements, as Alice and Bob have available to them an infinite set of weak measurement strategies [25]. Much 
more careful reasoning is required to rule out any such strategy. In the remainder of this section we present the details 
of this proof, which also results in a computation of an upper bound on the amount of information Alice can Bob can 
obtain when attempting to perform the nine-state measurement bilocally. 

C. Summary of the proof 

We assume that Alice and Bob have settled on a bilocal protocol with which they will attempt to complete the 
measurement as well as possible. Wc identify the moment in the execution of this measurement when Alice and 
Bob have accumulated a specific amount of partial information. We will have to show that it is always possible to 
identify this moment, cither in Alice and Bob's protocol or in an equivalent protocol which can always be derived 
from theirs. We then show, based on the specific structure of the nine states, that at this moment the nine possible 
input states must have become nonorthogonal by a finite amount. We then present an information-theoretic analysis 
of the mutual information obtainable in the complete measurement, and show, using an accessible-information bound, 
that the mutual information obtainable by Alice and Bob two-locally is less, by a finite amount, than the information 
obtained from a completely nonlocal measurement. 

Now we present the steps of this proof in detail. 

D. Information accumulation and the modified continuous protocol 

If the measurement has proceeded to a point where measurement record m has been obtained, an inference can be 
made using Bayes' theorem of the probability p(tpi\m) that the input state was 
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p(ipi\m) = 



(11) 



We take all prior probabilities p(ipi) to be equal to i, so they will drop out of this equation. The measurement 
probabilities p(m\tpi) are given by the standard formula 



Here S m is the operation element of Eq. (7); the quantum state in Alice's and Bob's possession has been transformed 
to 



We imagine monitoring these prior probabilities every time a new round is added to the measurement record in Eq. 
(10). We will divide the entire measurement into two stages, I and II; "stage I" of the measurement is declared to be 
complete when p(if>i\m), for some i, equals a particular value (the choice of this value is discussed in detail in the next 
subsection). "Stage II" is defined as the entire operation from the end of stage I to the completion of the protocol. 

There is a problem with this, however: the measurement record changes by discrete amounts on each round, and it 
is quite possible for these probabilities to jump discontinuously when a new datum is appended to this measurement 
record of Eq. (10). Thus, it is likely that the probabilities p{tj)i\m) will never attain any particular value, but will 
jump past it at some particular round. The probabilities would evolve continuously only if Alice and Bob agree 
on a protocol involving only weak measurements, for which all the A r k and B r k of Eqs. (8,9) are approximately 
proportional to the identity operator. But, in an attempt to thwart the proof about to be given, Alice and Bob may 
agree on a protocol which has both weak measurements and strong measurements (for which the operators of Eqs. 
(8,9) are not approximately proportional to the identity). 

However, such a strategy will never be helpful for Alice and Bob, because, for any bilocal measurement protocol 
which they formulate involving any combination of weak and strong measurements, a modified measurement protocol 
exists that involves only weak measurements for which the amount of information extracted by the overall measurement 
is exactly the same. For this modified protocol an appropriate completion point for "stage I" of the measurement 
can always be identified. Thus we can prove, by the steps described below, that the modified protocol cannot be 
completed successfully by bilocal operations, and we give a bound on the attainable mutual information of such a 
measurement. But, since the modified protocol is constructed to have the same measurement fidelity as the original 
one, this proves that any protocol, involving any combination of weak and strong measurements, also cannot attain 
perfect measurement fidelity. 

The modified protocol is created in a very simple way: it proceeds through exactly the same steps as the original 
protocol, except that at the point where the result of a strong measurement is about to be reported to the other party 
by transmission through the classical channel, the strong measurement record, treated as a quantum-mechanical 
object, is itself subjected to a long sequence of very weak measurements. The outcomes of these weak measurements 
are reported, one at a time, to the other party and appended to the measurement record in Eq. (10). 

The precise construction of this weak-measurement sequence is described in Appendix A. The weak measurements 
are designed so that in their entirety they give almost perfect information about the outcome of the strong measurement 
(the strong measurement outcome itself can be reported at the end of this sequence as a confirmation) . So, the recipient 
of this steam of reports from the outcomes of the weak measurements need only wait until they are done to know the 
actual (strong) measurement outcome in order to proceed with the next step of the original protocol. But, except in 
cases with vanishingly small probability, the information contained in the accumulating measurement record grows 
continuously. 

To conclude this discussion of the modified measurement protocol, we can show how Alice and Bob can be duped 
into being unwitting participants in the modified protocol, and also give an illuminating if colloquial view of how 
the "continuumization" of the measurement can take place. What is required is a modification of the makeup of 
the classical channel between Alice and Bob. We imagine that when Alice transmits the results of a measurement, 
thinking that it is going directly into the classical channel to Bob, it is actually intercepted by another party (Alice'), 
who performs the necessary sequence of weak measurements. Here is a way that Alice' can implement this operation: 
She examines the bit transmitted by Alice. If the bit is a 0, she selects a slightly head-biassed coin, flips it many 
times, each time transmitting the outcome into the classical channel. If the bit is a 1, she does the same thing with 
a slightly tail-biassed coin. At the other end of the channel there is another intercepting agent (Bob') who, after 
studying a long enough string of coin flips sent by Alice', can with high confidence deduce the coin bias and report 
the result to Bob. Alice and Bob are oblivious to this whole intervening process; nevertheless, as measured by the 
data actually passing through the channel, the modified protocol with nearly continuous evolution of the available 
information has been achieved. 



p(m|V>i) = Tr(S' m |?/>i)(V'i|< s 'm) = (ipilS^Smlipi). 



(12) 



S m \ijJi}. 



(13) 
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E. The state of affairs after stage I of the measurement 



Having established that no matter what Alice and Bob's measurement protocol, we can view the probabilities as 
evolving continuously in time, we can declare that stage I of the measurement is complete when 

max p(^»|mj) = ^ + e, (14) 

that is, after the probabilities have evolved by a small but finite amount away from their initial value of i. It should 
be noted that since some measurement outcomes might be much more informative than others, the time of completion 
of stage I is not fixed; it will in general require a greater number of rounds for one measurement record mj than for 
another. 

The e in Eq. (14) should be some definite, small, but noninfinitesimal number. Moreover, we will require that all 
posterior probabilities p(tpi\mi) be nonzero. For this any value smaller than ^ will be acceptable, since 

min p(ipi\mi) > - - 8e. (15) 
i y 

We now rewrite Bayes' theorem from Eq. (11): 

, , , ^_ (A\E mi \ipt) _ (ai\a mi \ai)(Pi\b mi \l3i) 

Here we have introduced an abbreviated notation for several operators which will come up repeatedly in the upcoming 
derivations: 

E-mi S mi — 0"mi ® b mi , 

Q>mi ^mj A mi , (1^) 
bmi — Bj nj B mi . 

Where there is no risk of confusion we will drop the index mj from E mi , a mi , and b mi . 
It is easy to bound the greatest possible spread in the probability distribution: 

^ < max = max fafeH»l < ±±* (18) 

8-9e - i,j p(i/>j\mi) i,j (aj\a\aj)(0j\b\Pj) ~ 1 - 72e v ' 

An important technical consequence of declaring stage I complete at this point is that it is guaranteed that all the 
matrix elements (ai\a\ai) and {(3i\b\[3i) are nonzero; this condition will be used repeatedly in the analysis of Appendix 
B (to be described shortly). The more crucial condition from Eq. (18) is that either the following equation is true, 



(adaK) / 8 + 72e 
max- — — — r>\ — — — , (19) 

i,i (aj\a\aj) ~ V 8 - 9e v ; 

or the corresponding equation for b is true. This says that either the operator a or b differs from being proportional 
to the identity operator by a finite amount. This will be the key fact in the analysis we are about to report. 

The basic idea is that at the completion of stage I, from Alice's and Bob's point of view there is a nonzero probability 
that the initial state was any one of the nine. In order for Alice and Bob to complete the job of identifying which 
state they have been given — with a reliability approaching 100% — it is necessary that the nine states remaining after 
stage I, Eq. (13), still be almost perfectly distinguishable. That is, the states must still be nearly orthogonal. But we 
can show that, because of Eq. (19), these residual states cannot be sufficiently orthogonal to complete the task. In 
fact, we will be able to compute exactly to what extent they must be nonorthogonal. For we can show that, if wc 
assume that the overlap of any two of these residual states is 8 or less, i.e., 

max {<f>i, mi \<f>j, mi ) = max f = =8, (20) 

then both a and b will both be almost proportional to the identity operator, with relative corrections proportional to 
6. This is done in Appendix B where these corrections are derived precisely. The important consequence of this is 
that 
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maxMM<l + 0(*), (21) 
%, 3 {aj\a\aj} 

and the same for b. Equations (19) and (21) cannot be both satisfied unless S = 0(e), that is, unless the residual 
states are nonorthogonal by a finite amount. 

So, at this point we can conclude that the measurement Eq. (4) cannot be done bilocally, except with less that 
100% accuracy; this is the main result that we set out to prove. We now proceed to a more quantitative analysis of 
bilocal approximations to this measurement. 



F. Information-theoretic analysis of the two-stage measurement 

We can now perform an analysis of the precise effects of this nonorthogonality, and derive an upper bound on the 
information attainable by Alice and Bob from any bilocal protocol. We will use the standard classical quantifier of 
information, the mutual information [26] , which gives the amount of knowledge of one random variable (in our case, 
the identity of quantum state tpi) gained by having a knowledge of another (here, the outcome of the measurement). 

Recall that we have broken the measurement by Alice and Bob into two stages. We will call the random variable 
describing the stage-I outcomes M/. The outcomes of all subsequent (stage-II) measurements will be denoted by 
random variable Mu. Alice and Bob's object is to deduce perfectly the label i of one of the nine states tjji (Eq. 
(3)); we will use the symbol W for this random variable (for "which wavefunction" ) . We quantify the information 
attainable in the measurement by the mutual information I(W; Mi, Mu) between W and the composite measurement 
outcomes M/ and Mjj. For a perfect measurement, the attainable mutual information is log 2 9; we will show that 
I(W; Mi, Mu) must be less than this. We first use the additivity property of mutual information ( [26], p. 125) to 
write: 

I(W; Mi, M n ) = I(W; M n \Mi) + I(W; M 7 ). (22) 

This expression introduces the mutual information between W and Mu conditional on Mi, which can be written as 
an average over all the possible outcomes mi of the measurement in stage I: 

I(W; Mh\Mi) = ^2p( mi )I(W; M„\ mi ). (23) 

Now, combining Eqs. (22,23) with the definition of the mutual information: 

I(W; Mi) = H(W) - H{W\Mi), (24) 
and using the fact that the entropy of the initial distribution H(W) = log 2 9, we obtain 

I(W; Mi, Mu) = log 2 9 - 5^p(m/)(H(W|mj) - I(W; M H \mi)). (25) 

mi 

To show that Eq. (25) must be less than log 2 9 it will be sufficient to show that each member of the sum is strictly 
positive. The conditions at the end stage I make it possible for us to do this. 

To make things explicit, let us suppose that at the end of stage I the residual quantum states (recall Eq. (13)) 
Pi = \4'i,mi){4'i,mi\ occur with probabilities qi = p(tpi\mi) from Eq. (16). (There will be no confusion from leaving 
out the m; label.) Moreover, let us suppose that the measurement to be performed in stage II corresponds to a 
positive operator- valued measure {Mb} fixed by measurement outcome mj. Then the explicit expression for the 
mutual information I(W; Mn\mi) — I(Mn; W\mi) becomes 

I{Mii-W\ mi ) = H(M II \m I ) - H{Mu\W,mi), 

9 

= -^(tr / oM 6 )log 2 (trpM 6 ) + ^g i ^(trp i M 6 )log 2 (tr / 9 i M fe ) , (26) 
b i=i b 

where p = J2i liPi- Note that H(W\mi) = - Yfi=i Qi l°g2 

Without loss of generality for the present set of manipulations, let us take 4>i, mi and (j>2. mi to be the two states 
assured to have a nonvanishing overlap (<pi,m I \4 l 2,m I ) = $ (recall Eq. (20)). We may partition the density operator p 
according to the two states that interest us most as follows. Let 
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2 



n 



E—Pi and r 2 = V — p 4 (27) 



si ^ s 2 



where si = qi + q 2 and s 2 = 1 — Si. We can think of this partition as generating two new "which wavefunction" 
random variables W\ and W 2 — the probabilities associated with these random variables are just the renormalized 
ones appearing in Eq. (27). Note that p = s\T\ + s 2 t 2 . Then, by the classic converse to the concavity of the Shannon 
entropy ( [27], p. 21), it follows that 

-^(trpM 6 ) log 2 (tr pM b ) < -ai ^(tr Tl M b ) log 2 (trriM 6 ) - s 2 ^(trr 2 M 6 ) log 2 (tr r 2 M b ) + h( Sl ) , (28) 

b b b 

where h(x) — — xlog 2 x — (1 — x) log 2 (l — x) is the binary entropy function. Hence, if we write 

2 

I{M u ;Wi\mi) = -5^(trTiM 6 )log 2 (trTiM 6 )+5^ — 5^(trpiM6)log 2 (tr W M 6 ) , (29) 

b i=l Sl b 

9 

I{M H ; W 2 \mj) = - V(trr 2 Af fc ) log 2 (trr 2 M 6 ) + V — V(tr Pi M b ) log 2 (tr Pi M b ) , (30) 

b i=3 z b 

it follows that 

I(M n ;W\mi) < siI(M n ;Wi\mi) + s 2 I(M II ;W2\mi) + h( Sl ) . (31) 
We can further bound this, so as to remove all dependence on states </>3, m/ through 4>g^ mi , by noting that 

9 

I(M ir , W 2 \m I ) < H(W 2 |mj) = - V - log 2 ^ . (32) 
Combining Eqs. (31) and (32) gives 

2 

H(W\ mi ) - I(W; M u \mi) > - ^ ft log 2 ft + *i log 2 *i - Si/(M 77 ; VKi |mj) . (33) 

i=l 

Equation (33) can be further bounded so as to remove any explicit dependence on q\ and q 2 by noting that, for 
fixed si, the first term in the expression on the right-hand side is minimized when q\ = q 2 . (One can verify this simply 
by taking a derivative respect to one of the free variables.) Making that restriction, one can see furthermore that the 
resultant term is monotonically increasing in qi . Thus the bound we are looking for can be found by taking q\ to be 
its minimal allowed value, namely q\ = ft = | — 8e (recall Eq. (15)). With all that in place, we have that 



H(W\mi) - I{W; M H \ mi ) > 2/3 



2 ! 

1 + ^(tr Tl M b ) log 2 (tr Tl M b ) - ^ - J^(tr Pi M b ) \og 2 (tr Pi M b ) 



2 

i=i b 



(34) 



where now n = 2 (pi + p 2 ). 

Finally it is a question of removing all dependence on the quantum measurement {M b }. This can be gotten by 
noting that the two right-most terms in the right-hand side Eq. (34) simply correspond to the mutual information 
given by the measurement {M b } about the two equiprobable nonorthogonal quantum states <pi, mi and </>2,mj (cf. Eq. 
(29)). Optimizing over all quantum measurements, we obtain the accessible information of those two states [28]. 
Inserting that into Eq. (34) and recalling Eq. (20) we finally find, 

H(W\ mi ) - I{W;M n \mi) > 2(3h{\ - ~¥) = (§ - \&t)h{\ - \^fY^ J 2 ), (35) 

where h(x) is again the binary entropy. 

The last bound can be made useful by establishing a quantitative link between e and S in Eq. (35). To do this, 
we must identify the value of S for which, given all the constraints derived in Appendix B, it is first possible to 
satisfy Eq. (19) for some values of i and j. It is this value of 5 which must be used in the bound Eq. (35). We have 
exhaustively examined all i, j pairs to determine which one allows the greatest ratio of a (or b) matrix elements for a 
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given value of 6. We find this to be the case for i = 8 and j = 6 in Eq. (3) (or other symmetry-equivalent ones). For 
this choice we can write 



(x 8 |q|x 8 ) _ Qqq + an + 2Rca 01 
(x 6 |a|x 6 ) an + a 22 + 2Re 012 ' 

This ratio attains its maximum value when 



(36) 



1 + 6 1-6 „ 1 + 6 

These are the extremal values permitted by Eqs. (Bll) and (B37). The value this gives is 



max /. II \ ^ 5 = 1 — 7 : 7= to - ( 38 ) 

1,3 {aj\a\aj) \ 1 - o J 1 - jy £ V 1 - 5 2 

The smallest value of 5 for which Eqs. (19) and (38) are consistent is given by the solution to the equation 



/.<«>- V££- < 39 > 

Using Mathematica, we have found the choice of e and 6 consistent with Eq. (39) that gives the strongest bound on 
the mutual information in Eq. (35). We obtain: 

I(W; Mi, Mji) < log 2 9 — A, (40) 

where the mutual-information deficit A = 0.00000531. This upper bound is attained when e = 0.00823, corresponding 
to a nonorthogonality parameter 6 — 0.00344 and a minimum-probability parameter (3 — 0.0453 = (0.408)/9. Thus, 
we bound the information attainable by bilocal operations by Alice and Bob away from that attainable in a fully 
nonlocal measurement by a minute but finite amount. 



III. SEARCHING FOR OPTIMAL LOCAL MEASUREMENTS 



Equation (40) gives our upper bound on the mutual information one can obtain by means of local operations 
and classical communication. However, it is unlikely that this bound is a close approximation to the actual optimal 
mutual information accessible in this way; most likely the optimal value is significantly lower. In this section wc 
explore specific measurement strategies for our nine-state ensemble in order to get a sense of how well one can in fact 
distinguish the states by local means. We will thereby obtain a lower bound on the mutual information. 

We begin by considering a simpler problem, namely, distinguishing only eight of the nine states from each other. 
That is, we consider the case where the prior probability of one of the states is zero. 

As we noted earlier, state ipi from Eq. (3) is special. In fact, it is never used in the analysis of Appendix B; thus, 
its presence or absence is irrelevant to the nonorthogonality conditions which we have derived. This means that this 
state is not necessary to make the measurement undoable bilocally. Thus, even if we take the prior probabilities of 
the states such that p(tpi) = 0, we will still reach the conclusion that the full mutual information is unattainable by 
a bilocal procedure (the quantitative analysis will be different than that given above). 

The same is not true for the other states: if the prior probability of any of the states ip2-i>9 is zero, then the 
measurement can be completed successfully by Alice and Bob. Figures 2 and 3 illustrate this for the case when the 
state -04 is left out. One way of explaining the strategy is that since the 4-5 domino of Fig. 2 is no longer complete, 
it can be cut by a von Neumann measurement, which will disturb state "05 but still leave it distinguishable from 
all the other eight states. Thus, the protocol can begin with cut 1 of Fig. 2, which corresponds to an incomplete 
von-Neumann measurement by Bob which distinguishes his state |2) from states |0) or |1) (but does not distinguish 
between |0) and |1)). The next step to be taken by Alice depends on the reported outcome as received by her from 
Bob, as indicated by the tree of Fig. 3; likewise all four rounds of the measurement are similarly contingent on the 
measurement outcomes of preceding rounds. The object at every round is to move towards isolating a domino so that 
its pair of states can be distinguished by a measurement in the rotated basis. 
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3 1 

FIG. 2. The sequence of measurements performed to distinguish the states of Fig. 1 if state ip4 is excluded. The dashed 
lines indicate the von Neumann measurements, the italic numbers indicate the order in which they are performed. Dash-dotted 
lines indicate measurements in the rotated basis. 
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FIG. 3. A tree depicting the four states of measurement indicated in Fig. 2. "A" and "B" indicate the party performing the 
measurement. BO/1 indicates that the and 1 outcomes are not distinguished. The bold-faced numbers at the base of the tree 
indicate the states which are inferred from this chain of measurements. 

We now turn to our original problem of distinguishing optimally among all nine states, assumed to have equal prior 
probabilities. The measurement strategy just described is a reasonable one to pursue even when all nine states are 
present. It accurately distinguishes states Vi-3 an d ^6-97 an d it distinguishes these states from ^4 and ipz,; it fails 
only to distinguish these last two states from each other. (In applying Fig. 3 to this case, one should imagine replacing 
"5" with "4 or 5.") Thus if Alice and Bob use this measurement, then with probability | they obtain the full log 2 9 
bits of information, and with probability | they are left one bit short; so the mutual information is log 2 9 — | = 2.9477 
bits. One can, however, do better, and we now present a series of improvements over the above strategy. 

We may express the improved measurements as sequences of positive operator- valued measures (POVMs). For 
example, Bob could start with a POVM consisting of elements b r i (these are 3x3 matrices that must satisfy the 
constraint ^2 rl b r \ — I), after which Alice will perform a measurement {a r 2}, and so on. As it happens, all of 
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our improved measurements can be represented in terms of POVMs whose elements are diagonal in the standard 
bases for Alice and Bob. It is therefore convenient to represent these POVM elements by their diagonal values. For 
example, in the measurement described above, Bob's opening POVM (in this case a von Neumann measurement), 
which distinguishes his state |2) from |0) and |1), has two elements which we represent as {1, 1, 0} and {0, 0, 1}. 

Our first improvement is to replace this von Neumann measurement by a more symmetric POVM whose elements 
are b\ = {1, ^,0} and b 2 = {0, i, 1}. (If Bob were to perform this measurement when his part of the system was in 
the central state |1), the outcome would be random.) Note that each outcome of this measurement rules out one of 
the columns of Fig. 1; that is, it rules out one of Bob's states |0) or |2). Once this has been done, Alice may freely 
cut either the 6-7 domino or the 8-9 domino, and from this point Bob and Alice may proceed as above to find out 
(with no further damage) in which domino the actual state lies. However, Bob's initial measurement damages both 
the 2-3 domino and the 4-5 domino, so that at the end, he will not be able to distinguish perfectly between ?/> 2 and 
ip3 or between ip^ and -ip§. Thus, in order to evaluate the mutual information obtainable via this strategy, we need to 
know the effect of Bob's initial POVM on these four states. This effect depends on what operation element B r \ we 
choose to associate with the POVM element b r \. Any B r \ satisfying b\ x B t \ = b r \ is allowed, but it is simplest to let 
B r i be |rl) ® VKi, where |rl) is the classical record of the outcome. To see how this measurement affects the states, 
let us suppose that the actual state is tp4, so that Bob's part of the system begins in the state \cj)) = + |2)). 

Then if Bob gets the outcome bi, the final state of Bob's part of the system (not including the classical record) is 
Vbi\<fi) — and if he gets the outcome 62 the final state is y/b~2\(j>) — ||1) + -^|2). (These states are automatically 

subnormalizcd so that their squared norms are the probabilities of the corresponding outcomes, namely, \ and |.) If 
the initial state had been ^5, then the results would have been the same but with |2) replaced by — 12). Thus the first 
outcome renders ip4 and ^5 completely indistinguishable, while the second merely makes them non-orthogonal. In the 
latter case Bob can, at the end, try to determine whether the original state was ip4 or ip$ by performing the optimal 
measurement for distinguishing two equally likely non-orthogonal states [28]. In this case the optimal measurement 
is simply the orthogonal measurement whose outcomes are B(l + 2) and B(l — 2). Similar considerations apply to 
the states ip2 or tp^. One finds that this strategy yields a mutual information of 2.9964 bits, which is an improvement 
over the strategy of Fig. 3. 

A further improvement is gained by replacing Bob's initial POVM by a less informative and less destructive one 
whose elements are {p, 5, 1 — p} and {1 — p, \ ,p}, where 5 < p < 1. The rest of the measurement is left unchanged. 
Optimizing over p, one finds that this strategy can yield 3.009 bits of mutual information. Note, however, that in this 
case Bob's initial measurement does not rule out any column of Figure 1, so that when Alice later cuts a domino, she 
may be cutting the actual state, in which case her action will cost them one bit. One may suspect that Alice should 
be more careful, and indeed the mutual information is improved if she makes a weaker measurement. In fact, the 
best strategy we have found delays until the fourth round a measurement that guarantees the complete cutting of a 
domino. 

This best strategy consists of the following steps, in which the values of the parameters p,q,r,s, and t are to be 
determined by optimization: 

1. Bob: {p, |, 1 — p} vs {1 — p, \ ,p\. Let us assume that Bob gets the first outcome. (In the other case all the 
POVM elements appearing in the succeeding steps have their diagonal values reversed; that is, the roles of states 
|0) and |2) are interchanged.) 

2. Alice: {0, 1 — q, 1 — r} vs {l,q,r}. The first outcome cuts the 8-9 domino, and we go directly to step 5. The 
second outcome makes it safer for Bob to risk cutting the 4-5 domino, so we proceed to step 3. 

3. Bob: {1 — s, 1 — t, 0} vs {s,t, 1}. The first outcome cuts the 4-5 domino, and we go directly to step 5. The 
second outcome makes it safer for Alice to cut the 6-7 domino, so we proceed to step 4. 

4. Alice: {1, 1,0} vs {0,0, 1}. Either outcome cuts the 6-7 domino. 

5. At this point some domino has been cut, so that Alice and Bob can proceed as above to determine in which 
domino the actual state lies. If this domino contains two states that have not been collapsed into the same 
state, Alice and Bob then perform a measurement to try to distinguish them. 

Optimizing over the values of the parameters, we find that the mutual information is log 2 9 — 0.1575 — 3.0125 bits. 
(One set of parameter values giving this result is p = 0.726, q = 0.395, r = 0.312, s = 0.071, t = 0.104.) Moreover, 
numerical evidence indicates that no further advantage is gained by allowing another round before making a firm cut 
(it would be a cut of the 2-3 domino, as we proceed clockwise around the grid). Thus it is conceivable that this value 
of the mutual information is indeed optimal, though we cannot rule out an entirely different strategy that does better. 
Summarizing the results of this Section and the preceding one, we have 



12 



log 2 9 - 0.1575 < I(W; M u M u ) < log 2 9 - A. 



(41) 



Note that the results presented in this section can be seen as a realization of the ideas behind our proof in Section 
II. Alice and Bob begin by performing a sequence of POVMs aimed at determining in which domino the actual state 
lies; this sequence can be thought of as stage I of the measurement. At this point, just as in our proof, the states 
remaining to be distinguished have become non-orthogonal, so that the final mutual information must fall short of 
log 2 9 bits. 




A(l+2) A (l-2) 
• • 
6 7 



FIG. 4. A modification of the tree of Fig. 3 which shows how all nine states can be reliably distinguished with some quantum 
communication from Alice to Bob. The wavy lines indicate the episodes of quantum transmission; the first transmission permits 
Bob to locally do a measurement involving both A and B pieces of the Hilbert space. 



IV. A REALIZATION OF THE TWO-PARTY SEPARABLE SUPEROPERATOR WITH SHARED 

QUBITS 

Having established that the measurement can only be done approximately if Alice and Bob only communicate 
classically, it is natural to ask what quantum resources would permit them to complete the measurement. It is 
obvious that they can do it if Alice ships her entire 3-state system to Bob and he performs the full operation in his 
lab, reporting the result classically back to Alice. In the case of all 9 states having equal prior probability, this requires 
the transmission of log 2 3 « 1.58496 qubits. If state ipi is left out and the other 8 states are equiprobable, the density 
matrix of the state held by Alice has less than maximal entropy, in fact it has /i3(|, §, §) = — log 2 3 ~ 1.16504 bits 
of entropy. Using the Schumacher compression theorem [29], this means that if Alice and Bob are performing many 
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shots of the same measurement on states drawn from the same ensemble, then the quantum transmission from Alice 
and Bob can be compressed to 1.16504 qubits per shot. 

However, in the nine-state case we can exhibit a protocol for completing the measurement which requires a smaller 
overall number of qubits transmitted. It starts with the imperfect protocol involving only classical communication 
just discussed (Fig. 4), and adds a part to permit states 4 and 5 to be perfectly distinguished. This will require only 
+ | ~ 1.14152 qubits (over many repetitions of the measurement). For the 8-state case the protocol will actually 
be worse than the straightforward one, requiring ft(|) + | w 1.20443 qubits of transmission. In neither case do we 
know that the procedures which we discuss here are optimal. 

The modified protocol for the 9-state case begins with Alice transmitting the |2) component of her Hilbert space 
to Bob. It is obvious that she could do this by sending 1 qubit, if she adopts a 3-qubit unary encoding of her 
Hilbert space, i.e., |0) — > 1 100) , |1) — ► 1 010) , and |2) — ► 1 001) . In fact the third qubit in this representation has less 
than maximal entropy, having entropy h(^) (it has higher entropy, /i(|), for the 8-state case). Thus, again using 
Schumacher's theorem [29], the transmission can be compressed over many realizations of the measurement so that 
only h(^) of a qubit per measurement needs to be transmitted. 

As indicated by the tree in Fig. 4, Bob's possession of \2}a permits him to immediately do a measurement 
which distinguishes whether the state is ipi, ip$, or is one of the others. After this has been done the sequence of 
measurements proceeds identically as in the classical protocol (Fig. 4), except that some possibilities can be pruned off 
as they correspond to tp4 and ip 5 cases which have already been distinguished. Before completing round 4, Alice must 
be again in possession of \2)a, which requires a qubit transmission back from Bob. This qubit is not compressible, 
but this transmission will only be required if the state is ip§ or Vv ; which will only happen | of the time, and will 
count as | qubits of transmission (| for the 8-state case). 

Adding up the qubit transmissions at the beginning and the end of Fig. 4 gives h{\) + § ~ 1.14152 qubits as 
mentioned above. This transmission can be made unidirectional, since a qubit sent in one direction, if it is entangled 
with a qubit left behind, may always be used to teleport a qubit in the opposite direction [4] . Note that even with the 
assistance of qubit transmissions, this protocol requires several rounds of classical transmission; it is a true "two-way" 
protocol, that is, requiring bidirectional classical communication [30]. 

V. THERMODYNAMICS OF NONLOCAL MEASUREMENTS AND STATE PREPARATION 

A. Irreversibility of measurement 

We now explore another information-theoretic feature of our two-party measurement that illustrates in another way 
the nonlocality of this orthogonal measurement. If the parts of the quantum states are assembled in one location, then 
a measurement in any orthogonal basis, in addition to being doable with 100% fidelity, can be done reversibly. That 
is, the quantum state can be converted into classical data without any discarding of information to the environment. 
Therefore by Landauer's principle [31] no heat is generated during the measurement. The reversible method can be 
illustrated by a simple qubit example: if the measurement is to distinguish |0) from |1), and the classical record of the 
bit is to be stored in the macro states |0) = |000...) and |1_) = |111...) (containing, say, 10 23 qubits), then the procedure 
involves starting the macro system in a standard state (so that the initial states of the system to be measured is either 
|0000...) or 1 1000...)), then performing repeated quantum XOR operations [30] with the qubit to be measured as the 
source and all the qubits of the macro state as the targets. In the end, the measured qubit may as well be considered 
to be part of the macro system containing the classical answer. Note that no interaction with any other environment 
is necessary to complete this, or any other, local orthogonal measurement. 

The situation is rather different for our two-party orthogonal measurement. Suppose that we consider a case in 
which the measurement can be achieved by Alice and Bob, for example the case in which state ipi is promised not 
to be present. Although Alice and Bob can perform this measurement, they clearly cannot do so reversibly, i.e., as 
a finite sequence of local reversible operations and classical communications. In the protocol described in Fig. 2, 
the irreversibility arises in the first step, where, if the state is ip^ , it is irreversibly transformed to either state |2)|1) 
or to 1 2) 1 2). Thus in this case 1 bit of entropy is produced. If each of the eight permitted states occurs with equal 
probability, then the average entropy generated is | of a bit. We cannot prove this entropy of measurement is minimal, 
though we have found no more efficient protocol. Many other cases can be easily worked out; for example, if it is 
promised that the state is only one of four (say, ipe, ip2, V's , and tp4), then j of a bit of entropy will be generated by 
the obvious protocol. 

It appears that reversible measurements are only possible if the set of states can be progressively dissected by Alice 
and Bob without breaking any dominoes. To formalize this notion, we introduce a few definitions. Let £ = {tpi} be a 
set of pure product states shared between Alice and Bob, where ipi = aj®/3,. Given such a set, we define a splitting of 
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S by Alice as a partition of S into two nonempty disjoint subsets S = Si U S2 such that for all tpi <G S\ and for all ipj 
in S*2, (ai\aj) — 0. A splitting by Bob is defined similarly. A set S is dissectible if there is a tree each of whose interior 
nodes is a splitting by Alice or Bob and whose leaves are singletons. For example, using the numbering of Eq. (3) and 
Fig. 1, the set { i ip2,i>6,' t Ps} is dissectible, but {V>2, ip4, ip6 V's} is n °t- The dissectibility of an arbitrary set S can be 
determined by examining finitely many possible splitting trees. Clearly any subset of a dissectible set is dissectible. 
It is evident that if an ensemble of states £ = {pi,ipi} has support only on a dissectible set, then both its entropy 
of preparation and entropy of measurement are zero. It is tempting to argue that, conversely, nondissectible sets, if 
they are locally measurable at all, have positive entropies of measurement, but to be sure of this, one would have to 
exclude the (unlikely-seeming) possibility of multi-step measurement procedures which, while not strictly reversible 
for any finite n, would succeed in identifying each of the states in the nondissectible set with error probability and 
entropy production both tending to in the limit of large n. 

A further analysis of this irreversibility reveals that it can be thought of as originating in the necessity for classical 
communication between Alice and Bob. In order to assure that the channel between them can convey only classical and 
no quantum information, the channel itself must possess a quantum environment (in order to dephase the data passing 
though it). This raises the possibility that Alice or Bob will be obliged to become entangled with the environment of 
the channel in the course of communicating the necessary classical information, thereby causing themselves to have a 
finite amount of entropy. Exactly the same amount must also appear in the channel environment. When, for example, 
Alice and Bob have been given state tp5 = |2) ® (|1) + |2)), and Bob sends the result of his first measurement in Fig. 
3 (collapsing his state to a mixture of |1) and |2)) to Alice, he has created entanglement between the measurement 
outcome and the environment, so that the joint system of message and environment is left an entangled state of the 
form |1) ® ei + |2) <g) e 2 , where e\ and e 2 are two orthogonal states of the environment. 

Note that measurement protocols requiring classical communication are not inevitably irreversible. For example, 
for the dissectible set {tp2, ^6, V's} considered previously, a bit of communication from Bob to Alice is required to 
complete the measurement; still no entropy is generated. This is so because this bit is guaranteed to be in one of 
the computational basis states, precisely the states with which the dephasing channel does not entangle. It is the 
necessity, in the above example, of delivering a bit to the channel which is in a superposition of basis states which 
leads to the entanglement and the irreversibility. 

B. Irreversibility of state preparation 

For dissectible sets of states, such as {ip 2 , 4>6, ips}, the mapping 

\i) ® \i) <-» \oti) <8> I/Si), (42) 

(using the notation of Eq. (3)) between classical instructions and the state described is locally reversible and can 
be performed in either direction without the generation of waste information. Conversely, nondissectible sets, such 
as {ip2,4>4,4>G,4>8}, cannot be prepared by any finite sequence of reversible operations, and we conjecture that even 
asymptotic multistep protocols could not reduce either the heat of preparation or the heat of measurement to zero. 
Perhaps surprisingly, the heats of preparation and measurement, by the best protocols we have been able to discover, 
are unequal. 

To give an example of irreversible state preparation, consider the following method for the preparation for the 
nondissectible set {1P2, ip4, ip6i V's} mentioned above. The protocol, which is the best we know, will produce h(j) w 
0.811 bits of entropy, considerably more than the entropy of measurement. The procedure works as follows: First, 
Bob computes a function / of the preparation instruction i which records whether the state to be synthesized is ^4 
(/(4) = 1) or one of the others (/(2, 6, 8) = 0), saving the result in a work bit. Then Alice and Bob reversibly prepare 
the modified four states of Fig. 5; that is, if the instruction is to prepare ip4, ip4' is prepared, and in the other three 
cases exactly the desired state is produced. 

This preparation can be carried out reversibly because the modified set {ip2, 1P4, ipe, ips} is dissectible. Next, Bob 
performs a Hadamard rotation on his state (|2) — ► |1 + 2), |1) — > |1 — 2), |0) — ► |0)) conditional upon the state of f(i), 
which transforms 4' into 4 and leaves the other three states unchanged as desired. Finally Bob erases his work bit f(i), 
which requires discarding h(j) bits of entropy into the environment. Similar reasoning shows that the equiprobable 
nine-state ensemble can be prepared at a cost of w 0.764, and the equiprobable eight-state ensemble (without 
the center state) at a cost of w 0.811 bits of entropy. 

It should perhaps be noted that the local preparation and measurement protocols we have described, while irre- 
versible from the viewpoint of Alice and Bob, become reversible when viewed from a global perspective, including Bob, 
Alice and the environment. In the preparation protocol we have just described this global reversibility arises because 
the waste classical information f(i) discarded into the environment in the last step is not random, but instead is 
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entirely determined by the joint state ipi of Alice and Bob. Therefore discarding it, though it increases the entropy of 
the environment, does not increase the entropy of the universe. The global reversibility of the measurement protocol 
for this same set of four states arises because the information discarded into the environment in the final stage is 
merely the other half of the entanglement created at an earlier stage of the protocol, when one of the dominoes might 
have been collapsed. Thus the final act of discarding restores the environment to a pure state. 







FIG. 5. A set of four states, shown in the domino notation, which can be prepared locally by Alice and Bob in a reversible 
fashion. 



When speaking of the thermodynamic costs of local preparation and local measurement, it should be recalled that, 
although any set of product states can be locally prepared, not all sets can be locally measured. The full set of nine 
states {ip\...ipg} of Eq. (3), for example, is not locally measurable at all, no matter how much heat generation is allowed. 
Conversely, there are sets of pure bipartite states that cannot be prepared locally, even with the generation of heat, 
because one or more states in the set is entangled. The concepts of entropy of preparation and entropy of measurement 
can nevertheless be extended to such sets, indeed to any orthogonal set of pure bipartite states, by allowing Alice 
and Bob to draw on a reservoir of prior entanglement (e.g., standard singlets ^~ = -^(|01) — 1 1 0) ) shared between 
them) to help perform actions, such as teleportation [4], that could not otherwise be done locally. In this fashion 
one can define an entanglement-assisted entropy of local preparation, and an entanglement-assisted entropy of local 
measurement. In entanglement-assisted measurement, an otherwise immeasurable set like the original set of nine states 
is rendered measurable by teleporting quantum information as required, say, in the protocol of Fig. 4. However, each 
teleportation generates two bits of waste classical information per qubit teleported, thereby contributing to the entropy 
of measurement. Again we can calculate the amounts of entanglement consumed and entropy produced by simple 
protocols, without knowing whether more efficient ones exist. The protocols described earlier give an entanglement- 
assisted entropy of measurement of 2.28304 bits for the equiprobable nine-state ensemble, and 2.40886 bits for the 
eight-state ensemble (omitting the central state), in each case twice the amount of entanglement consumed, because 
the protocols generate no other waste information aside from that associated with the teleportations. Turning now to 
entanglement-assisted preparation, a typical set of states requiring entanglement to prepare from classical directions 
is set of four Bell states [30] {<E> + , <I )_ , VE ,+ , 'J - }- The entropy of preparation by the obvious protocol in this case is 
two bits per state prepared (Bob reads the classical directions, applies an appropriate Pauli rotation to the standard 
to make the desired Bell state, then throws away the classical directions). 

Finally suppose Alice and Bob are given an unknown member of the 9-state set (or some other locally- immeasurable 
set) and wish to determine which state they have without the help of entanglement, but with some hints from a person 
who knows which state they have been given. We define the "advice of measurement" as the minimal amount of advice 
needed (in conjunction with their own local actions) to guide Alice and Bob to the right answer. As we have seen 
above, a negative hint like "the state is not -04" is sufficient. This might appear to be a lot of advice (as much 
as a totally informative positive hint like "the state is V3>") but m f &c t such negative hints are highly compressed 
by classical hashing techniques, asymptotically requiring only §log 2 f ~ 0.171 bits per hint in the nine-state case. 
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Appendix C gives details of the compression of these types of hints. 

We note, however, that the non-von Neumann measurements discussed at the end of section III allow an even 
more efficient form of advice. There it was shown that an appropriate POVM yields 3.0125 = log 9 — 0.1575 bits of 
information about the unknown state in the 9-state case; therefore, after Alice and Bob have performed their POVM, 
only 0.1575 bits of additional information need be provided asymptotically for them to identify the state exactly. 

As an aside, we note that the value of advice, and the amount needed, may depend on its timing. Although in the 
9-state measurement problem the most efficient advice we know of can safely be given at the end, after the POVM 
has been completed, there are other situations in quantum information theory, not to mention in everyday life, when 
early advice is more useful than late advice. In the BB84 quantum key distribution protocol [18], for example, the 
basis information may be regarded as a form of advice that is delayed to make it less useful to the eavesdropper. In 
a deterministic setting, where the adviser can foresee all future events, nothing is lost by giving all necessary advice 
at the beginning. But when unforeseen events are possible, the most efficient kind of advice, better than prior or 
posterior advice, may be as-needed or concurrent advice. Suppose Alice and Bob are about to begin a long car trip. 
They ask their more experienced friend Eve which route to take. A few days later they telephone again, asking her 
how to repair a flat tire. To be helpful, the route advice must be given at the beginning, but it would be wasteful 
to give the repair advice then because the flat tire might not have happened. The prominent role of measurements, 
whether von Neumann or POVM, with unpredictable outcomes, in our analysis of the 9-state problem suggests that 
as-needed advice might be the optimal kind here also. 

The notion of advice of measurement can be extended to sets of entangled states as well, for example the set of 



four Bell states. Here one bit of advice is sufficient (e.g., whether the unknown Bell state is of the + or — type), 
since the other bit ($ vs. ^) can be learned by comparing the results of local measurements in the z basis. The table 
summarizes the various measures of nonlocality for some of the ensembles we have been considering. 



Ensemble 9-state 2468 246 4-Bell 2-Bell 

Locally Preparable Yes Yes Yes No No 

Locally Measurable No Yes Yes No Yes 

Dissectible No No Yes No No 

Entropy of Prep. 0.764 0.811 2 1 

Entropy of Meas. 2.283 0.250 2 1 

Entanglement of Prep. 1 1 

Entanglement of Meas. 1.142 1 

Advice of Meas. 0.1575 1 



Notes: Entropies, entanglements, and advice for non-Bell ensembles are upper bounds from known protocols — actual values 
could be less. Entropy of measurement for 9-state and 4-Bell ensembles are for entanglement-assisted measurement, since these 
ensembles are otherwise not locally measurable. The nine-state ensemble consists of 9 equiprobable states ipi...ipg of Eq. (3) 
and Fig. 1. The 2468 and 246 ensembles are equiprobable distributions over {ip2 , V>4 , tpa , i>8 } and {ip2, i>4, ips} respectively. The 
4-Bell ensemble consists of four equiprobable Bell states {5> + , $~, \& + , ^~}, and the 2-Bell ensemble of two equiprobable Bell 
states, e.g., {$+,*+}. 
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VI. A THREE-PARTY SEPARABLE SUPEROPERATOR 



We shall now show another example of a separable von Neumann measurement, this time involving three parties, 
Alice, Bob and Carol, each holding just a qubit (two-state system). While we have not performed a full analysis 
of this case, it appears to have the same properties as the 9-state measurement above — that partial measurement 
causes indistinguishability of the residual states — suggesting that this is another case in which the measurement 
cannot be done locally by the three parties, even if the three can partake in any amount of classical communication 
among themselves. The superoperator involves a complete orthonormal set of eight product states living in the eight- 
dimensional Hilbert space. This appears to be the smallest possible Hilbert space which still presents such behavior (it 
is easy to show, using a simple elimination process, that a qubit-trit system or a qubit-qubit system is not sufficient). 
The eight states are: 





Alice 


Bob 




Carol 




1 = 


10} 


10) 




10) 


= 000 


2 = 


|1) 


11) 




11} 


= 111 


3 = 


10 + 1) 


10) 




11} 


= +01 


4 = 


|o-i) 


|0> 




11} 


= -01 


5 = 


10) 


11} 




|0 + 1) 


= 01+ 


6 = 


10) 


11} 




|o-i) 


= 01- 


7 = 


11) 


|0 + 


1} 


10) 


= 1+0 


8 = 


11} 


o - 


1} 


10) 


= 1-0 



(43) 



(leaving out normalizations). On the right side of these equations we introduce an obvious shorthand for these states 
which we will use in the Discussion. We will indicate the evidence that the separable superoperator consisting of the 
projection operators 



Si = \i)A\i)B\i)c{<j>i 



(44) 



cannot be performed by 3-local operations, in which Alice, Bob, and Carol can only perform local quantum operations 
and broadcast classical information to each other. 

The arguments are equivalent to those in the two-trit example, and again rely on considering any measurement 
as a two-stage process. In the case where all prior probabilities are equal (| in this case), we declare stage I to be 
complete when 



maxp((f>i\mi) = 



1 



(45) 



with some positive e smaller than -g^, 
distribution: 



It is again simple to bound the greatest possible spread of the probability 



7- 



56e p(0i|raj) 
< max 



(&|£?|0i> . l + 8e 
= max - — j — j — - < . 

i,j p((pj\mi) i,j (<f>j\E\(j>j} 1 — 56e 



(46) 



As before, this equation guarantees that all diagonal matrix elements of E, {4n\E\<f>i) — (4>i\a ® 6® c\<fii), are nonzero, 
and it also guarantees that the maximum and minimum matrix elements are different. Also as before, we can show 
that the states after stage I become nonorthogonal, which should permit us to derive a definite mutual-information 
deficit. We will not develop this proof here, but we will give a simple sketch of how we prove that the states are 
nonorthogonal. We will just show here that the states cannot be exactly orthogonal: 



|(^ |a® &<8>c|&)| =0, Vi^j. 

This proof can be generalized step by step into a full analysis as in Appendix B. 
1) Writing the orthogonality condition for i = 3 and j = 4 gives condition that 

(a 00 + a i - a w - an)b 00 Cii = 0. 



(47) 



(48) 



Since diagonal matrix elements of b and c must be nonzero by the arguments from Eq. (46), the a factor must be 
zero; taking the real part gives 



(49) 



18 



2) Taking taking i — 5 and j = 6 and applying the same reasoning gives 



coo = en. 



3) And taking i = 7 and j = 8 gives 



&oo = hi- 

4) Now we write the four orthogonality conditions coming from all combinations of i = 3, 4 and j = 5, 6: 



(aoo + a i)6oi(cio + en) 
-(a o + a i)6oi(cio - en) 

(aoo - aoi)&oi(cio + en) 
-(a o - aoi)&oi(cio - en) 



Adding these four equations gives 



Since a o ^ and en 7^ 0, we conclude that 



(50) 



(51) 



(52) 



(53) 



(54) 



(55) 



(56) 

Putting observations 1) through 6) together, we conclude that a, b and c must be proportional to the identity 
operator. But this is inconsistent with Eq. (46), which established that the different diagonal matrix elements of E 
must differ by a finite amount. When developed more fully, this result should contradict the assumption that the 
measurement could be done even approximately by 3-local operations. 

Note that nothing in the argument involves the simple product states <f>\ or <j>2 ■ We conclude that the measurement 
is still not doable locally even if these two states are promised to be absent. On the other hand, it is easy to show 
that eliminating any one of the states (j>z-s would permit the measurement to be done. The layout of these states in 
the Hilbert space shown in Fig. 6 gives some intuition for why these should be true, as in the two-party case: any 
simple von Neumann involves cutting one of these "dumbbells" and making those pairs of states indistinguishable. 



4a 00 feoiCii = 0. 



601 = = 610. 



5) Doing the same for the equations involving % = 3, 4 and j = 7, 8 gives 

coi = = cio- 

6) And finally from the equations involving i — 5,6 and j = 7, 8, we get 

«oi = = aw 




FIG. 6. The layout of the eight states of Eq. (43) in the 2x2x2 Hilbert space. The "dumbbells" have a similar meaning 
as the dominoes in Fig. 1. 
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Finally, the most economical technique that we have found for making the measurement doable with quantum 
communication is for a whole qubit to be sent from one party to another. That is, no compression of the quantum 
information seems to be possible in this case, whether or not states fa or fa are excluded. It is easy to show that the 
resulting two-party measurement which is required after this qubit transmission is doable by local actions. 



VII. DISCUSSION 



The results of this paper, extensive as they are, raise many additional fundamental questions about multipartite 
quantum measurements, most of which we have only incomplete answers to at this time. We would indeed be pleased 
if the ambitious reader has a notion of how to attack any of the following puzzles: 

There are a variety of simple variants on the separable measurements presented in this paper for which we do not 
know how to prove or disprove bilocality. One is a very obvious generalization of the 9-state example: 

Alice Bob 

= |i> |i>, 
V4= |o) cos 23 |o> + sin e 23 \ 1), 

^ = |o) 



-sin0 23 |O) + cos 023 1 1), 
008 6*4512) + sin 45 1 1), 



tp' 4 = 1 2) costf 4 5W +sinf 45 |i;, , , 

V> 5 = |2) - S in045|2)+COS04 5 |l>, K 1 



ip' 6 = cos 6» 67 1 2) + sin 67 1 1) |0), 

V> 7 = -sin0 6 7|2)+cos0 67 |l) |0), 

V> 8 = cos0 89 |O)+sin0 89 |l) |2), 

V> 9 = -sin0 89 |O)+cos0 89 |l) |2). 



That is, each of the domino pair is rotated by a different angle. While we strongly doubt that the case of general 0s 
is any different from the case = 7r/4 that we have analyzed, we have no proof that these general states specify a 
nonlocal measurement. 

We have noted that, although there is no 2 x 2 pure-state example that involves pure states of a separable but nonlocal 
superoperator, there is a mixed-state measurement which has some very curious properties. It is a measurement to 
distinguish two density matrices po and pi , where po is an equal mixture of the pure product states 0+ and +0 (we 
use the notation introduced in Eq. (43)) and p\ is an equal mixture of 11 and — . It appears that, despite the fact 
that this measurement involves distinguishing two separable, orthogonal states, nevertheless the measurement cannot 
be done bilocally, indeed, the measurement apparently cannot be done by any separable superoperator! It is easy to 
show that the projection measurement into these states can produce an entangled output from an unentangled input 
(for instance, 0(0++)); no separable superoperator can do this. It will be interesting to understand the minimum 
degree of nonlocality needed to perform this measurement. 

A nonlocal measurement would yield 1 bit of information since p and pi are orthogonal. It would be interesting 
to try to apply the techniques developed in this paper to determine a bound on the attainable mutual information by 
a bilocal approximation to this measurement. 

There are other multi-party examples for which such proofs would also be desirable. A modified version of the 
2x2x2 example above involves just four states: 



01+ 
1+0 

+01 (58) 



These states do not correspond to a separable trace-preserving superoperator, as the complement to these four 
measurements is not separable [32]. Nevertheless, this can be viewed as a measurement game in which Alice, Bob, 
and Carol are promised that they are given one of these four states, and their object is to distinguish, with only 
classical communications, which state it is. We suspect that they cannot, but we have not been able to prove it. 

An even more exotic set of orthogonal states that we have considered is one involving 10 parties, each with a qubit. 
This set of states again only involves basis vectors 0, 1, +, and - locally, so that a typical one of the 1024 basis states is 
1+-+0 — 110. This construction emerges from a counterexample of a proposition in tiling theory, the Keller conjecture 
[33]. The violation of this conjecture means that the 1024 states do not conform to the domino or dumbbell layout of 
the examples in this paper, where pairs of dimensions of the Hilbert space are covered by pairs of orthogonal states. 
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We have not attempted to prove non-10-locality for this example, but we note that there is no simple von Neumann 
measurement that will distinguish them. 

Curiously, despite the complexity of the example, we are able to show that just two copies of any state are sufficient 
for the 10 parties to be able to locally distinguish the state with classical communication, as in all the examples 
considered in Sec. IV. The procedure is simple: measure one copy in the 0/1 basis, and the second in the +/- basis. 
This has raised another question: are there any sets of states, entangled or not, for which some finite number greater 
than 2 of copies of the state is necessary for distinguishing the states reliably? So far we have found no examples 
where more than two copies of the unknown state are needed [34] . Indeed we know of no examples of two orthogonal 
pure states, product or entangled, which require more than one copy to be relibaly distinguished. Earlier in this 
section we noted a set of two orthogonal mixed states of two qubits which appears to be locally immeasurable. But 
here too, two copies are sufficient to make the states distinguishable. It would appear that further work on the tiling 
problems could produce other interesting examples for numbers of parties between 3 and 10. 

The domino representation of two-party quantum states bears some resemblance to an approach taken in classical 
communication complexity problems to finding the most efficient interactive scheme for evaluating a function of data 
held by both Alice and Bob with the minimum classical communication [35] . The resemblance comes when the one-bit 
output of the function is depicted in a two-dimensional table; then the most efficient communication is determined by 
a recursive subdivision of such a table into unanimous blocks. It remains to be seen whether this observation would 
lead to more examples of interesting separable quantum operations. 

The present investigation has required a very precise distinction between different types of quantum operations 
which are normally considered identical. Going back to the 9-state calculation, we can consider two different quantum 
operations related to the measurement operation of Eq. (4) (repeated here): 

\i)A\i) B (il>i\, (59) 
IMiMiU (6°) 

(si) 

We have disproved the existence of (59). We can from this disprove the existence of (60), which is a cloning operator: 
we just note that Alice and Bob could perform this cloning many times, then perform measurements to deduce with 
very high confidence the state label i, thus performing (59). We can also rule out any form of weak cloning [36]. 
The case for (61) is more subtle, since we normally think of these projection operators as precisely what we mean 
by the measurement (59). This is true in a one-party world, since performing the projection means that a classical 
record of the state is available somewhere in the world. But in a multi-party situation, this record could be in a form 
which is split between the parties in a way which would require quantum communication to unravel. Therefore, we 
emphatically state that (59) and (61) are not generally identical in a multi-party scenario. Indeed, we note that there 
is another case in which two such operators are completely different. For Bell states, the measurement operator Eq. 

(59) cannot be done bilocally, because of the entanglement of the states; but the dephasing operator Eq. (61) for the 
Bell states can be done bilocally; it has been described as the "twirling" operation of Ref. [30] . 

Nevertheless, we have been able to prove that (61) is not doable for the 9-state examples, but by quite different 
arguments than those given for Eq. (59), presented in Appendix D. But the issue of approximations to (61), or 

(60) , remains unsettled. That is, we do not know how to quantify the precision with which Alice and Bob could do 
these operations approximately. A large part of the difficulty is that we cannot use a simple, classical measure of 
information like the mutual information, which was possible for (59) because the output is a classical record. For (60) 
and (61) an operator measure, involving a notion of distance between two quantum operators, would have to be used. 
The theory of such operator measures is considerably less well developed [37]. 

It seems likely that the states we have explored in this paper would be usable for quantum cryptography, but we 
also have more questions than answers on this point. It is now clear [17] that bipartite orthogonal states are generally 
useful for cryptography when one particle in the state is received by Bob before the other has been launched by 
Alice. This forces Eve to measure one particle at a time. If Eve had no quantum memory, then the security of the 
cryptography protocol would be assured if the measurement of the state could not be performed bilocally, with the 
restriction that only round of measurement (one transmission from Alice to Bob) would be permitted. The nine 
states that we have analyzed have this property. However, given that Eve can have a quantum memory, the problem 
is a bit different, corresponding to there being some restricted form of quantum communication between Alice and 
Bob in the measurement protocol. In the cryptographic application, of course, Eve has more work to do: she must 
determine the identity of the state and provide it undisturbed, at the appropriate times, to Bob [38]. Thus, a separate 
study is required to establish that the nine states form a good basis for orthogonal quantum cryptography (which, 
however, is easily provided by the analysis of [17]). At the same time, we may imagine that the nine states might 
provide a stronger cryptographic primitive for some purposes, given that they cannot be identified even by repeated 
communication between Alice and Bob. Another useful feature of the states as a cryptographic primitive might 
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be the fact that two copies of them can be identified exactly. But we have no concrete notions of what these new 
cryptographic applications might be. 

Finally, we note that the basic question which began our investigation remains unanswered: what is a compact 
mathematical description of a superoperator which can be performed by only classical communication between the par- 
ties? We have only disproved one natural hypothesis, that this set coincides with the set of separable superoperators. 
No alternative hypothesis has presented itself. 

All of these questions indicate, we think, that we still have many very basic questions about the structure of 
quantum mechanics, about the nature of quantum nonlocality and of entanglement, questions whose answers will be 
of central significance in our quest to employ quantum mechanics in the transmission and processing of information. 
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APPENDIX A: DECOMPOSITION OF ARBITRARY POVM INTO A SERIES OF VERY WEAK 

MEASUREMENTS 

Any superoperator acting on a system of dimension n can be replaced (nonuniquely) by the following procedure: 
appending an ancilla of dimension m, performing a unitary transformation, tracing out a subsystem of dimension n 2 , 
and measuring (using a standard and complete measurement) a subsystem of dimension n^, which we call a probe. 
As a result, the state of remaining system (of dimension m = -^^) can be calculated, and it is uniquely determined 
for any given superoperator despite of the nonuniqueness of the procedure. In cases where there is no probe to be 
measured (77,2 = 1), this is the so-called trace-preserving superoperator. If instead the trace-out step is eliminated, 
this is the most general POVM (positive operator- valued measurement). In our case, where all information is used 
for the optimal extraction of information, we are interested in this second case. Thus, the most general POVM 
can be replaced by the three operations - the appending of an ancilla, the unitary transformation and the standard 
measurement of a subsystem. 

Suppose we are given a state, on which we will obtain some information using a POVM. We will show how to 
approximate this POVM by a continuous process. The addition of ancilla does not influence the state; the unitary 
transformation can be done as continuously as we wish. We shall now show that a standard complete measurement 
can be replaced by a continuous process (to any desired approximation). As a result of the above discussion, any 
POVM can be approximated in the same way. 

In order to measure the probe (a subsystem of dimension 113) using a complete standard measurement in a basis 
\i), we write the combined state of the remaining system and the probe (of dimensions nn\ = mn^) after the unitary 
interaction as 



where X)"=o \ a i\ 2 = 1' an( ^ where \cf>i) are normalized states (not necessarily orthogonal) of the remaining Tri- 
dimensional subsystem. Without loss of generality we can assume that the probe is a qubit, since any other measure- 
ment can be replaced by a set of yes/no questions, thus n 3 — 2. 

In a standard measurement we apply the projection postulate directly on the probe to yield a classical result i 
with probability \ai\ 2 , and a remaining subsystem in a state \4>-i). In a nondcmolition measurement [39] a state \i) is 
tranformed to \i)\i)i and the new system (|i)i) is measured instead of the probe, hence a probe in a state \i) is not 
demolished by this measurement. Attaching K such devices to \tp), the measurement of the probe can be done in a 
nondcmolition way using a unitary transformation to a state 
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where now the measurement postulate can be applied on any (or on all) of the additional "quantum measuring devices" 
\i)ki where 1 < k < K. We use the term quantum measuring device (QMD) to say that no classical measurement (no 
actual "printout") was performed at that stage. As a result, this measurement process is reversible until we apply 
the projection postulate on one of these QMDs, and the state \ip) can be reproduced from |<I>) with perfect fidelity. 
Measuring any of these QMDs is equivalent to performing a standard measurement on 

To obtain an approximation using a continuous measurement we replace the QMDs by "weak QMDs" (WQMDs), 
meaning that we replace a standard measurement by a sequence of weak measurements. (Weak measurements were 
first discussed by Aharonov and others [25].) The unitary transformation producing (A2) is replaced by one leading 
to 



|*}=ao|<M|0) |0'} 1 ...|0V+ail<M|i> |l')i...|lV , (A3) 

where the two possible states of the fc'th WQMD, |0')fe and |l') fe are highly overlapping. We can always choose them 
to be 



|0') = cos0|O) + sin0|l), 

=sin0|O)+cos0|l), (A4) 

with 9 = 7r/4 — e with small positive e. If the state we start with is then the probability to obtain a correct result 
i from a probe in a state \i') is 

cos 2 = l/2[l + sin(2e)]. (A5) 

We approximate: 

cos0 w (1/V2)[1 + sin(2e)/2] w (1/V2)[1 + e], 

sin0w (l/V2)[l-e\. (A6) 

For any state \ip), if only one WQMD is measured (in the computation basis), the effect of this measurement on the 
rest of the system is weak, and the state of the original system can be reproduced with high fidelity which approaches 
one as e approaches zero. For instance, if a result is obtained, we can reproduce an unnormalized state of the 
remaining system and the probe 

IVw) = aocos0|^o)|O) +aisin0|0i)|l) (A7) 

yielding a modification of \ip) of order e: 

|Vout> = |^>+c[ao|0o>|O)-ai|0i)|l)] . (A8) 

Thus, measuring each such QMDs one at a time, wc obtain a process which is as close to continuous as we want, since 
we can choose e as small as we want. 

The last thing to verify is that we can choose K big enough in order to yield the same probability of obtaining the 
result i as in a standard measurement. 

If the state of the probe is |z), then each of the WQMDs is in pure state When we measure K WQMDs their 
outcomes are independent and identically distributed according to a binomial distribution with probability cos 2 to 
obtain the correct result i for each one. Let us assume that K is odd. When we look at K such WQMDs and take a 
majority vote, the probability to obtain a correct result is given by 

(K-D/2 

£ ^)cos 2 ^- fc )0sin 2fe 0. (A9) 

(Note that this expression can also be calculated by expanding 

\<j>i)\i) |i')i...|*V = \<t>i)\i) [cos K 6\i...i)+cos K - 1 esine\ii...i ] )+sin K 6\jj... ] )], with j = (A10) 

if i = 1 and vice versa, and calculating the probability of each string.) 

This is equivalent to a classical problem of having a biased coin with a known bias cos 2 0, and trying to guess 
whether it is biased to give more heads or more tails. One can bound the above sum directly, or approximate it using 
some central limit theorem (since it is a random walk) . 
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Alternatively, one can use a strong version of the law of large numbers, which tells us that we can guess the direction 
of the bias with probability exponentially close to one. Suppose one throws a biased coin, so that in one try it gives 
Prob(x = 1) = p and Prob(x = 0) = 1 — p. According to Bernstein law of large numbers [40], when throwing the 
same coin K times the actual average of the K trials, Y^f=i X i/K> l& ver y close to the expectation value p, except 
with probability 

K 

Prob[\^x t /K -p\>5]<2e- Ks2 (All) 
i=i 

for any K , and for 5 smaller than p(l — p). 

To apply this law to our case recall cos 2 9 = l/2 + sin2e/2, and sin 2 9 = 1/2 — sin2e/2, so that Bernstein law applies 
for any 5 < 1/4 — sin 2 2e. For small e (e.g. less than 1/8) we choose 5 = sin2e/2 which is in the appropriate range. 
Now, the probability of observing X i/K > 1/2 when the Prob(x = 1) = sin 2 9 is less than or equal to 

K 

Prob[\ J2 X */ R -P\> sin 2e / 2 ] < 2e" Ksi " 2 2e/4 . (A12) 

i=l 

Since K can be chosen independently of e, any K 4/ sin 2 2e will do. 
This means that for such K the expression 

(K-D/2 

S = E U ) c o^ K - k) 9s\n 2k 9 (A13) 



k=l 

is exponentially close to 1, and its complement 



(K-D/2 

1-S= Y (, )sin 2 ^- fc )flcos 2fc fl (A14) 



is exponentially small. 

In the general case of a state \^), we need to expand the state (as was done above), and calculate the probability of 
each string in order to take a majority vote as before. This process yields (assuming as before odd K) a probability 
of 

(K-l)/2 [K-l)/2 

H 2 E (^)cos 2 ^- fe )0sin 2 ^+(l-K| 2 ) £ r)sin 2 ^- fe >0cos 2fe (A15) 
fc=i ^ ' fe=i ^ ' 

to obtain the correct result. Using S we get 

M 2 S + (l-Kf)(l-S) (A16) 

so the result is obtained with the correct probability |ai| 2 5. (This is equivalent to obtaining a coin with bias cos 2 9 
with probability |ck | 2 or with another bias sin 2 9 with probability 1 — \a\ 2 and throwing it as many times as we want 
in order to learn which coin we received with any desired probability of success.) 



APPENDIX B: CONSTRAINTS FROM APPROXIMATE ORTHOGONALITY OF RESIDUAL STATES 

According to Eq. (20) the overlaps between the residual states </>i lTOJ after stage I (Eq. (13)) are all bounded by 5: 

v /(V , i|a®o|V'j)(V'j|a ( S ) o\Wj) 

The task here is to use these inequalities to derive various constraints on the matrix elements of the operators a and 
b in Eq. (17). 

We note before we begin that during the completion of stage I, Alice and Bob may each have augmented their 
Hilbert spaces beyond their original three dimensions. They might do this, for instance, as part of a strategy that 
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requires retaining some of the quantum ancillae from one round of the protocol to the next. Such a strategy finds 
its expression in the fact that the S mi operators need not be square matrices, so that the states of Eq. (13) will live 
in a Hilbert space larger than the original 9-dimensional one. Fortunately, this contingency has no relevance for the 
constraints we are about to derive: it is only the algebraic properties oiE = S' i S = a®b that concern us, and a and 
6 are always square matrices whose dimensions are determined by the size of the initial Hilbert space. 

Let us use the notation (i|a|j) = a^ and (i\b\j) = bij and note the following preliminary things. Recall that a and 
b are both positive semidefinite operators so that, for each i, (ipi\a (g> b\ipi) > 0. Recall that, from Eq. (18), we have 
for each i and j, 

(^|q®6|^) < l + 9e 

(il)j\a®b\ipj) ~ l-72e ' 1 ' 

Thus it follows that a o, 022, 6 o, an d 622 are all strictly positive. Moreover, looking at i = 2 and j = 3, for instance, 
in Eq. (B2) gives 

600 + &11 +2Rcfr 10 < l + 9e 

600 + 611 -2Re 610 " l-72e ' 1 ' 

From this and the i = 3, j = 2 condition: 

81e 

|2Rc6io| < J—^- e (bn + 600) • (B4) 

In similar fashion, taking {i, j} = {2k, 2k + 1} for k = 2, 3, 4, we have 

81 f 

|2Re6 2 i| < —-—(622 + 611) , (B5) 
2 — 63e 

81e 

|2Rca 2 i| < 2 63£ (fl22 + an) , (B6) 
81c 

|2Reai | < 2 _ 63e ( ffl11 + a ° o) ■ (B7) 

We can now bound the relative variations among the diagonal elements of a and 6 in terms of 5 in the following 
way. Taking i = 2 and j = 3, Eq. (Bl) looks like 

aoo|6oo - 601 + 610 — fon I < 



and simplifies to 



Therefore it follows that 



In similar fashion, we get 



V a oo(6oo + 601 + 610 + 6ii)a o(6 o - 6 i - 610 + 611) 
|6oo - 6n + 2Tm6io| 



v/(&oo + 6ii) 2 -(2Rc6i ) 



< 5 . (B9) 



611 



600 + 6n 



< S . (BIO) 



|6n ~ 6221 <5, |ail ~ a22 ' <S, |a °°~ an| (Bll) 
611 + 6 22 an + a 22 a 00 + an 

These inequalities help us bound the off-diagonal matrix elements of a and 6 in terms of 5. Consider the combination 
of the conditions from Eq. (Bl) given by taking i = 2, 3 and j — 4, 5. The i = 2, j = 4 inequality, for instance, is 

|ao2(6 2 + 6qi +612 +6n)| < s ^ B1 ^ 

\/aooa22(6oo + hi + 2Rc6 10 )(6 2 2 + 6 n + 2Re6 2 i) ~ 

It will be convenient to introduce the notation 
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D\ ± =aooa 2 2((&oo + &ii)±2Re6io)((6ii+622)±2Rc6 2 i) . (B13) 
With this, we see that we can write 

002(602 + &01 + 6 12 + b u ) = D++ 7 ( 1 ) e ^ <1> , (B14) 
where 7W is a small amplitude and <f>^ is an appropriately chosen phase that satisfy the constraints 

< 7 (1) < 5 and < <t> {1) < 2tt . (B15) 
In similar fashion, taking the remaining combinations of i = 2, 3 and j — 4, 5 we arrive at 

a 02 (-&02 + b i - b 12 + b n ) = £>+- 7 (2) e^ <2) , (B16) 

a 02 (-&02 - b i + b 12 + bu) = D_+ ^e 1 ^ , (B17) 

002(602 - 601 - &12 + 611) = £>__ 7 ( 4 ) e ^ <4) , (B18) 

where each 7W and <j>^ satisfy the same constraints as in Eq. (B15). Adding Eqs. (B14) and (B16)-(B18) together 
and taking the absolute value of the resultant, we get 

4|a 2&n| < 5{D++ + D+_ + D_+ + (B19) 

Now suppose that Re 610 > and Re 621 > 0, and let 

81e 2 + 18e /rtn . 

z=l-\ = . B20 

2-63e 2-63e V ' 

Then it follows from Eqs. (B4) and (B5) that 

D\ + < z 2 a oa a 22 (b ao + b n ){bu + b 22 ) , (B21) 

D\_ < za oa 22 (b QO + b n )(bn + b 22 ) , (B22) 

D 2 _ + < za oa 22 (b QO + b n ){bn + b 22 ) , (B23) 

D 2 __ < a 00 a 22 (6 00 + 6n)(6u + 6 22 ) • (B24) 

Combining this with Eq. (B19), we find 

j^L < -& {z + 2VI+ 1) ^- V(boo + 611X611 + b 22 ) . (B25) 

V a 00O22 4 On 

Note that Eq. (B25) remains true regardless of the signs of Re6i and Re6 21 . This is because Eq. (B19) remains 
invariant under a change of sign for either or both of these terms. Now it is just a question of using the previously 
derived constraints for the diagonal elements of a and b to put a limit on how large the right-hand side of this can be. 
With some play, one sees that this occurs when 

feoo = fe22 = 6n^4, (B26) 



and, at that point, one has 



< 1 -(z + 2V-z + l) T ^- s . (B27) 



Vaooa22 



Alternatively taking i = 6,7 and j = 8,9 in Eq. (Bl) and running through a set of steps analogous to those in 
Eqs. (B12) through (B27), one finds 



^02! . 1 



^_<_ (2 + 2 ^ +l) _. (B28 ) 

By a slightly more elaborate strategy, we can now find bounds on all the remaining off-diagonal terms. Let us 
consider the inequalities derived from Eq. (Bl) for i = 4, 5 and j = 8,9. These can all be written in a compact 
notation as 
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|(«20 + (-l) Sl a 2 i)(622 + (-l) S2 &i2)| <6 (B2g) 



V a 22(aao + an + (-l) s ^2Rc a Q1 )(b 22 + bn + (-l) s ^2Re b 12 )b 2 

The sign bit s\ = 0, 1 corresponds to j = 8, 9; the bit S2 = 0, 1 corresponds to i = 4, 5. Let us focus on only one of 
these four equations, one for which 

(-l) Sl Rea i < , (B30) 

(-l) S2 Rco 12 > . (B31) 

It is clear that at least one of the four sign choices will satisfy these conditions. In that case, it follows that 



|a ;° + ( ~ 1)Sla21 ' < SJ± (b 22 + bll + (-l)-2Re6 12 ) . (B32) 
\Za 22 {aoo + an) V 022 



Using Eq. (B5), this implies 



ft20 + (-1 Slfl 21 r-j. / 1 , , , , 

—==== < Vzdd— (b 22 + bn) . (B33) 
V a 22(aoo + an) V 022 



Maximizing the right-hand side of this subject to the constraint Eq. (Bll) gives that 



020 + ( - 1)Slfl21 = 0<vi< SJ-^- . (B34) 



1 7 7 — 1 ' — 1 — \/ 1 x ' 

V a 22(aoo + an) \ i-0 

Hence, using Eq. (B27) 



M WA+ , j 020 ' , (B35) 



V ft 22(aoo + an) \ 1- 5 ^022(000 + an) 



< SJ-^- + \ (z + 2V~z + 1) ^ J-^- • (B36) 
V 1 — 2 y ' 1 - y a o + an 

Finally optimizing the left and right-hand sides of this subject to the constraints imposed on a o by Eq. (Bll), we 
obtain 



= ^ <6 \ < U2 < u e = [V-Z+ \ (z + 2V~z + 1) • (B37) 

V a 22an V V 1 <5 y 

This is the desired bound. Applying exactly the same reasoning to Eq. (Bl) with i = 6, 7 and j = 4, 5, we find the 
same bound on |6io|/\/&ii^oo- Doing the same with i = 2,3 and j = 6,7, we find it for |aoi|/\/aoo&ii; and finally, 
with i = 8, 9 and j — 2, 3, we find it for \bi 2 \/y/bnb 22 . 



APPENDIX C: COMPRESSIBILITY OF CLASSICAL ADVICE 



To see how negative advice of the form "not state f can be asymptotically compressed, consider first the simple case 
of the equiprobable 8-state ensemble. Suppose Alice and Bob are faced with the task of performing a large number 
n of the 8-state measurements; they are promised that ipi does not occur and all other states are equi-probable (this 
is the simplest case). Then they must ultimately distinguish 8" possible outcomes. But one single string of n hints 
(e.g., state 1 is not ^2, state 2 is not ^9, state 3 is not ^2, etc.) successfully covers 7™ of the possible outcomes. 
Thus, only approximately (|)™ distinct hint strings need ever be used to help Alice and Bob with their measurements. 
If Alice, Bob and the hint-giver pre-agree on which hint strings are to be used and agree on a numbering of them 
(which amounts to the selection of an expanding hash function), then the hint can be conveyed in nlog 2 f bits, or 
log 2 I « 0.193 bits per measurement. 

For general, not necessarily equal, prior probabilities pi of the 9 states, more sophisticated counting methods are 
required to calculate compressibility of the hints. Let A be a typical sequence of n states chosen independently 
with probabilities {pi,i = 1...9}, having about npi states of type i for each i. Let Y be a sequence of n hints of 
the form "the state is not state j" is chosen independently with probabilities {qj,j = 2. ..9}. A hint sequence Y is 
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valid for a state sequence X if none of the hints is false (e.g., if X = 136, then Y = 353 is valid but Y = 356 is 
not, because the last hint is false). The probability that the hint sequence Y will be a valid for state sequence X is 
II^ =2 p"' 1 q '^; therefore, using an expanding hash function from an appropriate strongly 2-universal class [41], one can 
show that — ^2i— 2 Pi 1°§2(1 — 1i) bits of advice per state are asymptotically necessary and sufficient to specify a valid 
hint sequence for a typical X. The optimal compression for hints of this sort can then be obtained by varying the 
probabilities to minimize the above expression. When this is done, it turns out that if one or more of the states pt 
occurs with probability significantly higher than average, the corresponding hint "not pf' should never be used, i.e., 
the corresponding hint probability vanishes. 



APPENDIX D: DEPHASING SUPEROPERATOR RULED OUT BY INVERTIBILITY RESULT 

In this Appendix we show that the superoperator which dephases in the nine-state basis, Eq. (61), cannot be 
implemented bilocally by Alice and Bob. As a preliminary, we prove that this superoperator cannot be performed 
with no classical communication between Alice and Bob. Consider two possible input states to the superoperator, 
-^(ips ~ ^9) an d ipi (see Eq. (3)). These states have the same reduced density operator for Alice, so if there is 
no communication the output states must have the same reduced density operator; but the dephasing superoperator 
requires that they be different (in the first case |(|0)(0| + |1)(1|), and in the second |1)(1|). Thus, the superoperator 
is not doable without communication. 

Now we consider the case where some data stream m (see Eq. (10)) has passed between Alice and Bob. Without 
loss of generality, we can assume that all the data transmission occurs before Alice and Bob trace out any of the ancilla 
Hilbert spaces that they have introduced (recall that the output space of Eq. (61) must be the same 3x3 space as 
the input). Now, adopting the "continuumized" view of superoperators that involve channel transmissions introduced 
in Sec. II D, we proceed with the proof by considering two separate cases: 1) If the input to the superoperator is one 
of the nine states tpi, the set of residual states S m \ipi) (Eq. (13)) at a certain instant become nonorthogonal, without 
any of the states being annihilated (non-annihilation is an obvious requirement of the dephasing superoperator). 2) 
The residual states always remain orthogonal. 

1) Disproving the bilocality of the dephasing superoperator in the case where residual states become nonorthogonal 
is accomplished by the following discussion of invertibility for superoperators. 

Definition: A superoperator S = {Si} is weakly invertible relative to a set of pure states if there exist 

superoperators % = {T^} for each i such that the superoperator U = {TijSi} satisfies 

U(\v k )(v k \) = \v k )(v k \ (Dl) 

for all k. Note that the conventional projection superoperator of Eq. (61) is one such operator of the form U. 

Since the dephasing operator Eq. (61) is an example of an operator of the form of U in Eq. (Dl), any partial 
completion of the superoperator up to some instant, in particular the instant at which the residual states become 
nonorthogonal, must be weakly invertible. But we can easily contradict this with the following lemma: 

If the superoperator S = {Si} is weakly invertible relative to the set V of pure states, then for all \vi) , \v 2 ) € V, if 
\vi) and \v 2 ) are orthogonal, then so are Si\v\) and Si\v 2 ) for all i. 

Proof: Let % be superoperators demonstrating the weak invertibility of S. Then, by definition, for all \v) in V, 

^^TijSi\v)(v\SlT^ = \v)(v\. (D2) 

ij 

This implies that for all i, j, 

TijSi\v) = aij(v)\v), (D3) 

for some scalar oiij(v). But then 

(TijSi^iTijSilvi)) = ay(vi)(atf(«2))*M«i> = 0, (D4) 

so 

^(TijSi^iTijSilvj) = (v^SjSilvi) = 0. (D5) 
3 

2) Disproving the bilocality of the dephasing superoperator in the case where residual states always remain orthog- 
onal throughout the period when Alice and Bob are communicating through the channel requires a different line of 
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argument from case 1. First, we note that the calculation of Appendix B shows that if the states remain exactly 
orthogonal (cf. Eq. (Bl)), then each operator a m and b m must be exactly proportional to the identity operator; this 
in turn implies that each operation element is proportional to a product of an Alice and a Bob unitary operator, 

S m = a m U mA ® U mB . (D6) 

Note that by the conditions of Appendix B, the posterior probabilities must remain finite for this result to hold; but, 
as noted before, if this condition were not satisfied, it could be immediately argued that the superoperator could never 
result in the desired dephasing operator. In fact, of course, using Eqs. (11,12) it is straightforward to show that the 
posterior probabilities p(ipi\m) remain identical to the prior probabilities p(ipi); no information about the states ever 
flows through the classical channel. 

Given that the superoperator is constrained to be of the form Eq. (D6), it is easy to complete the proof. (D6) 
implies, for each state ipi of Eq. (3), 

U m A ® U mB \ai ® ft) = e i6i \ ai ® ft). (D7) 

This implies 

U mA \a i )=e i6iA \a i ), (D8) 

and a similar relation for B. (It is this last step which cannot be taken for the Bell-state dephasing case mentioned 
in the text.) Now, referring to Eq. (3), considering cases i — 1,2,4 shows that U m A is diagonal in the |0, 1, 2) basis; 
then i — 8, 9 shows that (0|Z7 my 4|0) = (1|[/ TO ^|1), and i = 6, 7 shows that (1|[/ TO ^|1) = (2|{/ TO ^|2). Thus, U m A 1 and 
similarly U m B, are proportional to the identity operator. But the identity superoperator can be done without any 
classical communication, and the argument at the beginning of this appendix shows how this possibility is excluded. 
This completes the proof for both cases. 
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